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PREFACE 



Criterlon*re£erenced testing is the newest and fastest grcving 
accountability technique today in our elementary and secondary schools 
across the nation* States , large city school systems , counties and 
small local school districts all are interested in utilizing criterion- 
referenced testing in their assessment programs « Because of this 
trend, the representatives of the member states of the Mid-Atlantic 
Region Interstate Planning Project requested that a part of the 
December, 1972 meeting of the project be allocated to a presentation 
of papers related to criterion-referenced testing* It was decided 
that individuals on the staffs of project member states were, as a 
result of their experiences, well qualified on certain aspects of 
criterion-referenced measurement. Therefore, the program was 
planned to include reports from four member states and to provide 
ample opportunity for discussion* 

The frame of reference for all present«.tions was the definition 
of criterion- referenced tests given by Robert Glaser in the 1971 
edition of Educational Measurement , "a criterion-referenced test is 
one that is deliberately constructed to yield measurements that are 
directly interpre table in terms of specific performance standards 
The scope of the papers was from the rationale for criterion-referenced 
testing to comparisons with standardized norm-referenced tests to 
development plans and activities to recommendations related to 
criterion-referenced testing* 

The collection which is presented here is offered to educators 
across the country with the hope that the experiences of the Mid- 
Atlantic Interstate Project member states, their school systems and 
their staffs, will be of assistance to other educators in their 
work with assessment programs* 

Mildred Pivetz Cooper 
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INTRODUCTION AND BACKGROUND 

Accountability Is an old concept In American education that Is 
fathering a new generation of educational processes as still more strat- 
egies are aimed at supplying adequate and meaningful accountability 
models. Cost effectiveness has long been the factor which has provided 
the thrust for fiscal and management accountability efforts,* Since the 
1960*s,additional thrust has been supplied through the concepts of per- 
formance accountability which has grown with the advent of the federal 
programs which have had evaluation measures built In. Testing programs 
have received a considerable Impetus as a result of these program re- 
quirements. 

Standardized tests have been widely used and just as widely Inter- 
preted as school districts have attempted to evaluate pupil progress and 
program effectiveness through results of standardized norm-re irerenced 
tests. For example, Kentucky has mox^ed from twenty-six (26) pilot test- 
ing programs testing 11,447 pupils In the school year 1965-66 to one 
hundred and fifty-six (156) testing programs in I£A's testing 200,000 
pupils in 1970-71 school year. The Elementary and Secondary Education 
Act, Title I requirements have further encouraged the use of sub-tests 



of the standardized tests to evaluate specific goals. 

Standardized norm-referenced achievement tests have become widely 
used throughout the country in testing prograias designed to measure 
student performance as a criterion for accountability. From a psychomet- 
ric point of view, much has been written about the dangers of using 
standardized norm-referenced test scores to evaluate the progress of in- 
dividuals. 

Robert Glaser was among the first to move in the promising direction 
of utilizing test items that are derived directly from a specific well 
defined objective or performance standard. He pointed out the distinction 
between this type of test called "criterion-referenced tests" which 
measures what the pupil can do and norm-referenced tests which compares 
a pupil's progress with that of others (Lipe and Jung 1971, RER, October 
1971). Thus, the intended use of the test results determines the type 
of test which will be most appropriate. Norm-referenced tests can pro- 
vide normative data to compare school systems within the nation, to eval- 
uate programs within a system and to a lesser extent to indicate pupil 
achievement ranking within a grade or a classroom. Criterion-referenced 
tests offer a more specific and valid means of assessing individual per- 
formance and the accomplishment of specific program goals. 

The current trend in education is a preciseness in the determination 
of program goals in terms of behavioral objectives and measurable oper- 
ational terms. This has created new problems in evaluation. The norm- 
referenced tests failed to adequately test these goals which were differ- 
ent for each program. Thus, the selection of adequate aud appropriate 
custom-designed measurement devices which could measure attainment of 
specific objectives became a pressing need. Many State Departments of 
Education have faced assessment problems as a result of the time con- 
straints, inadequate budgets, and the speed with which many assessments 
have been mandated. The standardized testing resources at hand have been 
utilized with varying degrees of effectiveness in meeting assessment 
needs. Adaptation of reliable and accepted testing materials has resul:- 
ed. Adapting test items to measure specific program outcomes, a new 
concept in utilizing existing standardized testing materials, has result- 
ed in criterion referenced tests that adequately meet accountability 
needs . 
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Defense of Norm^'Referenced and Criterion Referenced Tests 



A. V. Nltko has concluded In his definitions of criterion referenced 
tests that, "In short It Is the use to vhlcn test results are put that 
determines their nature and the construction methodology," In Instruction 
various procedures cannot be considered Independently of the Instructional 
context In which they will be used. Particularly inq>ortant Is the In- 
tegration of test design with Instructional design. This consideration 
has led educators to develop unique criterion referenced tests that focus 
on an Individual's performance attainment In mastering program objectives. 

Standardized norm-referenced tests have earned a respected position 
In educational achievement testing programs when their Intended uses have 
been carefully observed. The expertise of those who construct and revise 
the major published standardized tests Is beyond that of any organization 
whose sole purpose Is other than test construction. Despite the disillu- 
sionment experienced by many educators with standardized norm-referenced 
tests, the facts support their use when they are appropriately selected 
for the Intended ura, administered and Interpreted correctly, and when 
the results are reported adequately and understandably. There has thus 
far been no acceptable alternative offered by the critics to accomplish 
the Intended uses of standardized norm-referenced tests. These uses are 
specifically to provide normative data to con^are school systems within 
the nation or lesser geographic regions, to provide bench marks of ptxpll 
achievement In broad general knowledge bases when conq[>ared to those cf 
other pupils. Individually or in groups. Such bench marks of conq>aratlve 
rankings can point out weaknesses in programs which may be meeting local 
needs yet be ignoring the mobility of American families and the increas- 
ing need for coiq>etencies in effecting smooth transitions and adapting 
to programs in education in other school systems nationally. Thus, there 
is a specific use for which standardized tests are intended. This ubb 
deals with broad norm bands which are intended to show general coiqmratlve 
trends in educational achievement when coapared with the entire reference 
group. 

Critics are mmierous who attack the fact that standardized norm- 
referenced tests are not culture free and that they fail to validly test 
the non-reader or the poor reader. All written tests, regardless of their 
origin, have failed to adequately overcome cultural bias and reading 



problems when used with large groups of pupils# However, measurement 
problems arise when faulty uses and uninformed, short-sighted interpre- 
tations are made with the scores of these pupils. Until more perfect 
instruments are devised, the standardized norm-referenced tests will 
continue to provide the best normtive data for comparative studies in 
pupil achievement. 

Attempts to measure the specific effects of an educational program 
on an individual must seek a more precise instrument of assessment which 
is custom designed to measure that particular program. Criterion- 
referenced tests offer a more specific and valid means of assessing indi- 
vidual performance and the accomplishment of program goals. 

A criterion-referenced test is one that is deliberately constructed 
to yield data that are directly interpretable in terms of specified per- 
formance standards. This type of test is not designed to facilitate in- 
dividual difference comparisons such as the relative standing of a pupil 
in a norm group or population. Such tests are not designed to enable 
one to speculate on a pupil's relative standing with respect to a vari- 
able such as reading ability. Instead, criterion-referenced tests are 
concerned with measuring an individual's performance relative to a speci- 
fied domain of tests which includes both content and process. 

Glaser was among the first to move in the lising direction of 
utilizing test items that were derived directly jL^om the coutent of the 
behavior categories that were to be measured. He pointed out the dis- 
tinction between the criterion-referenced tests which measured what a 
pupil could do and norm-referenced tests which compared a pupil's progress 
with that of others (Lipe and Jung, 1971 RER, October 1971). 

Then, criterion-referenced tests offer distinct advantages when used 
to measure goal achievement and program effectiveness, to measure indivi- 
dual achievement in mastering the program objectives and to gain inform- 
ation about the placement of an individual in a continuum of specified 
skills within a program. Individualized instruction is particularly en- 
hanced by criterion-referenced tests. Small increments in behavior can 
be detected by periodic small scale tests enabling more frequent op- 
portunities for incentive delivery. Further, when items are constructed 
to directly measure the degree of attainment of various behavioral goals 
of a program, an interesting marriage is achieved between constant be- 
havioral observation and sporadic evaluation via norm-reJ^-^renced achieve- 



ment tests (Lipe and Jung, 1971, RER, p. 274). 

As the feasibility of individualized inatructlon increases, know- 
ledge of an individual learner's position in the group becomes less im- 
portant than knowledge of the competencies that the individual does or 
does not possess. Hence, it is likely that educational assessment will 
require norm-referenced information in addition to ct^iterion-referenced 
information. 

The distinction between norm-referenced achievemeiti: tests and 
criterion-referenced tests can be found by (a) examining the purpose for 
which the test was constructed, (b) the manner in which i.t was construct- 
ed, (c) the specificity of the information yielded about the domain of 
instruct ionally relevant tasks, (d) the generalizability of test perform- 
ance information to the domain, and (e) the use to be made of the test 
information. 

Justification for the Adaptation of Norm^Referenced Test Items 
as a Criterion Reference Measure 

Recognizing the professional skills of the psychometrists and 
statisticians found on the staffs of the recognized national test publish- 
ers, educators will be well advised to utilize the services of test con- 
sultants in devising criterion-referenced tests from those materials 
already purchased within a system. Few school systems or State Depart- 
ments of Education have the staff, the financial resources, or the time 
to devote to the development of valid criterion referenced assessment 
materials. 

In utilizing the professional services of testing consultants, the 
teachers, administrators or State Departments of Education should care- 
fully examine the behavior -categories that are to be specified in a test 
outline. A systematic plan must be devised to make sure that each try- 
out form of the test includes a representative sample of items in the 
behavior categories. In making this plan, the domains of items must be 
carefully examined and stratified to allow for a representative sampling. 
The terminal objectives or desired outcomes must be stated and the be- 
havior which defines each point along the achievement continuum is care- 
fully defined. The test items that test these behaviors are then select- 
ed with the guidance of the test consultants. The validation of the 
resulting test is established by the test publishers who provide further 



services throughout the administration, scoring, Interpretation and evalu- 
ation of the results. 

Sotne areas of skill mastery are more readily adapted from standard- 
ised norm-^referenced tests than others. The mathematics sub-tests are 
examples of this adaptability. It may be that a criterion-referenced 
test covering a wide domain Is not likely to provide data that satisfac- 
torily fulfills the basic purpose of such tests. It Is suggested that 
for any given domain, a coordinated set of diagnostic sub-tests should be 
available each of which Is made up of Items that are homogeneous In the 
sense that they test performance on a specific behavior or on a cluster 
of behaviors that are taught as a unit. 

There are numerous considerations Involved in creating valid tests. 
Considerations involve such matters as: semantics, cultural bias, sta- 
tistics, levels of difficulty, comprehensive coverage of doniains, ana 
even that of appealing format. Norm-referenced tests either have already 
met the publisher's criteria in such matters or the publisher can provide 
services to provide for these matters whena criterion-referenced test is 
adapted from items found in existing norm-referenced tests. Few profes- 
sional educators could devote their tln^e and limited resources to the 
production of criterion-referenced tests that could surpass the product 
that is devi^.loped jointly by educators and a testing publisher with their 
specialized resources. 

Another factor favoring; the adaption of criterion-referenced tests 
from norm-referenced tests is that there are already existing variety in 
the forms of tests at each level of difficulty. Most individualized in- 
struction utilizes pre-testSy terminal tests done immediately after 
finishing a program, and post-testing done some time later. Norm-refer- 
enced tests are readily adaptable for assessing an individual's progress 
over a continuum of learning skills within each domain. 

The most attractive factors involved in adapting existing norm-refer- 
enced test items to criterion-referenced tests are unquestionably the time 
saving and the money saving possibilities. Much has already been invested 
in extensive standardii:ed tesV.ing programs in many school systems. Further 
use of these materials to satisfy assessment needs through careful adapt- 
ation of the materials to acco^ijodate the intended puzpose of the tests is 
logical and economical. 



These factors then seem to support the practice of utilizing the 
materials at hand to develop crlterlon«*referenced test Iteris: 



First - The need for criterion tests will Increase as toore 
systems adopt performance accountability models* 

Second - Standardized norm«referenced tests are accepted 
and are, in fact, improving constantly* 

Third ^ Educators can coordinate their Instruction exper* 
tlse with the tent construction expertise of test 
publishers to produce an appropriate and satlefac** 
tory testing tool* 

Fourth - The testing consultants can handle the technical 
considerations In test construction and scoring 
where few teachers feel secure In doing so* 

Fifth - Appropriate 9 economical tests can be made avail** 
able when needed without tedious and expensive 
delays* 

Sixth - Test publishers have multiple forms of 5,ests 

available to meet the needs for frequent criterion- 
referenced tests* 

And finally - A school system has established a working relation- 
ship with a consulting service that can help teachers avoid problems In 
evaluation and will be available to advise as the programs progress 
and the final evaluation becomes necessary , 



December 1972 
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Criterion-Referenced Test Development - 
A Contracttial Arrangement Between the 
Public Schools of the District of Columbia 
and a Commarclal Test Publisher 



Mildred P. Cooper 

Assistant Superintendent 

for Research and Evalization 

Public Schools of the District of Columbia 



C I FOREWORD 

O 

Everyone interested in educating youth is interested in developing 
as meaningful a program for the individual as is possible. To do this 
^^^^^ we must understand the needs of the individual student. In speaking of 

the instructional program needs of this student we must find a way to 
' discover his strengths and his weaknesses in terms of the educational 

program which we offer him. Increasingly, school systems are turning 
to criterion-referenced testing to provide this necessary diagnostic in- 
formation. 

Ahead of other urban school systems and most school systems in the 
nation in recognizing the value and the relevancy of criterion-referenced 
testing, the Public Schools of the District of Columbia began in the fall 
of 1970 to develop such a testing program for its students. As the co- 
ordinator of that test development effort in which x^e utilized a commer- 
cial establishment, I have been requested to give a historical descrip- 
tion and a chronological account of our experiences. The account is 
brief and yet is detailed. 
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CRrrERION-REFERENCED TEST DEVELOPMENT IN READING AND MATHEMATICS 



la 1970 the Board of Education of the District of Coltimbia engaged 
the services of Dr. Kenneth Clark to develop a plan to improve the 
academic achievement of the students in the District schools. There 
were many, many aspects of the plan, however, I will refer here only to 
those relating to the topic under discussion. The plan called for the 
establishment of minimum floors of achievement in reading and mathematics 
and also called for the administration of standardized achievement tests 
three times a year— in the fall, in January, and in the spring. 

As you probably know from the reports in the news media, great tur- 
moil occurred in the District of Coltnnbia as a result of the adoption 
of the Clark Plan by the D. C. Board of Education. Teachers and other 
school staff had not been involved in the development of the "Design for 
Excellence", therefore, were especially antagonistic to certain features. 
Nevertheless, school staff did carry out the responsibilities that they 
had been assigned. 

The reading specialists and the mathematics specialists in those 
subject field departments developed a series of sequential skills in 
reading and in mathematics at each grade level and designated those to 
be considered as minimum floors. These were then issued to the field by 
the Division, of Instruction for use by teachers during the school year 
1970-71. 

In the meantime, the Pupil Appraisal Section of the Department of 
Pupil Personnel Services established the schedule for the three-time 
testing. The first administration of the standardized achievements tests 
was scheduled for late September, 1970. When the time came, many teach- 
ers refused to administer the tests and the Washington Teachers' Union 
demanded that the Superintendent of Schools, Dr. Hugh J. Scott, establish 
a Union-Board Testing Committee to come to some agreement on the testing 
policies and program. The Superintendent did establish that committee 
and designated me as chairman. After continuous meetings over a two- 
week period, a set of recommendations were proposed to the Superintendent. 
With almost no changes, the Superintendent approved these recommendations 

and they became the policies on testing for the Public Schools of the 
District of Columbia. 

10 



Within the testing policies, it was agreed that system-wide stand- 
ardized achievement testing would not occur more than twice a year and 
that the school system would begin the development of instruments which 
would more relevantly measure the progress of the students ?n the Dis- 
trict of Columbia, Conferences were held with measurement consultants 
and on the advice of Dr, Ralph Tyler a decision was made to use a dif- 
ferent approach to the D, C, Public Schools' testing program. The di- 
rection recommended was the development of "mastery" or "criterion- 
referenced" tests related to the specific instructional objectives of 
the D, C, Public Schools, 

After a further decision was made to limit the criterion-referenced 
test development to the areas of reading and mathematics, Superintendent 
Scott appointed me to coordinate the task. That was in December, 1970, 
Since a contract was in effect at that time with CTB/McGraw Hill, the 
logical approach was to negotiate the substitution of the development of 
the criterion-referenced teste for the already contracted materials and 
scoring services for that no longer desired third administration of the 
standardized achievement tests for that school year. It was then that 
I began my discussions vith a representative of the company. After basic 
information had been communicated to the company, a meeting was held com- 
prised of the heads of the Division of Instructional Services, Pupil 
Personnel Services, the Pupil Appraisal Section and the Departments of 
Research and Evaluation, Certain agreements on the plan of test develop- 
ment were reached. Subsequent to the meeting, the CTB representative and 
I had many conferences on procedures, funding, content, and other related 
items, 

A proposal was developed and submitted by the California Tes.t Bureau/ 
McGraw Hill in January 1971, I forwarded copies of the proposal to the 
department heads \Aio had been involved in the test development planning 
meeting. After their review, comments and appropriate changes, the pro- 
posal was submitted to the D, C, Public Schools Contracts Division for 
the development ot the contract. 

With a contract docimient which spelled out most of the terms, I met 
with the District of Columbia Government Negotiated Services Chief, a 
Vice-President of CTB/McGraw Hill, the D, C, Public Schools Contracts 
Specialist and the designated Project Director for CTB/McGraw Hill. The 
purpose of this meeting was to ••hammer" out a clause which would permit 

11 



the D. Public Schools to recover the contractual costs of the test 
developmeat through a reimbursement or discount on the purchase of the 
test instruments over a period of time. Agreement was reached and by 
the end of 1973 the total contractual development costs will have been 
recovered by the D. C. Public Schools. 

My review so far has been an account of the events and the proce- 
dures leading up to the development of the contractual agreement with 
the test publisher. Although this phase took a great deal of time both 
on ray part and that of other school staff, it was small in comparison 
to the amount of time and effort required in the implementation of the 
contract. The greatest problem that I encountered in the criterion- 
referenced test development project was that no resources were allocated 
for the test development other than the limited funds for the contractor. 
The work done by all of us from the school system was in addition to our 
regular responsibilities and assignments. Test development is much too 
complex a task not to have individuals provided with ample work time to 
carry out the task. Most people outside the field of measurement and 
evaluation do not understand the complexities of test development. 

The initial step in the implementation of the contract was the de- 
velopment of a work flow chart by the CTB Project Director and me. A 
copy of this chart appears on the next page. looking at the chart and 
with the aiiq)lifications which follow, the chronology, the problems and 
the actions of the project can be traced: 

2.0 Review of D. C. objecti/es - 

2.1 Identify D. C. objectives - 

The District of Columbia objectives in reading and mathematics 
had to be thoroughly reviewed to determine the congruence between 
objectives and the instructional program thr-n operating* 

a) The first task was to get written verification and confirmation 
of objectives. 

b) The revision of mathematics objectives in 1971 caused delays in 
the schedule. 

c) The committees of reading and mathematics specialists then re- 
viewed objectives to be sure all were in line with instruction- 
al program goals and designs. 

d) As the coordinator I tried unsuccessfully to get funding for 
work during the stmimer of 1971, therefore, reading specialists 
worked whenever they could fit it into their schedules. 

2.2 Develop test specifications - 

In all test instrument development it is necessary to identify 
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certain factors which Influence the format and content of the In- 
stnnnent. In this test Instrument development the following were 
considered: 

a) Level of tests 

b) Number of objectives to be measured 

c) Overlap between levels 

d) Number of Items per level 

e) Types of response 

f) Problems of non-graded and other class structure 

The above items were discussed at meetings with testing staff and 
instructional personnel. 

2.3 Develop test items - 

Test items had to be written to measure the objectives from 2.1 
and to meet the specifications of 2.2« 

a) Decisions on factors in 2,2 had to be observed. 

b) Test items used were those developed by teachers, test special- 
ists and subject field specialists. Items included were vali«- 
dated items. 

c) Review was made by D. C. Public School reading and mathematics 
specialists. 

d) In reading 9 selections to be used were scrutinized for appro«- 
priateness. 

e) Where the specialists felt chaises were needed in items or 
selections y such suggestions were forwarded to CTB. In all in- 
stances those changes were incorporated. 

2.4.1 Construct pilot instruments - 

Because of problems of vocabulary level and approach at the 
early childhood years , it was mutually agreed that a pilot study 
was mandatory if an appropriate instrument were to be developed. 

a) Tests were for grades Kindei'garten, 1, 2 and 3. 

b) New test items were developed. 

c) Conferences were held with C. school personnel relative to 
the pilot instrtmients. 

2.4.2 Research pilot test - 

The pilot testing was done in spring 1972 in a san^le of schools 
across the D. C. Public School system. This pilot was necessary in 
order to observe student reactions to the response modes under study. 

a) Levels K-3 were included in the pilot project. 

b) A carefully designed sample of schools and students was develop- 
ed. 

c) The pilot study wa« coordinated by a staff member of the Depart- 
ments of Research and Evaluation. Conferences with principals 
and teachers were required prior to the conduct of the study. 

d) The pilot study required 12 professional ^staff members from the 



Departments of Research and Evaluation and the Pupil Appraisal 
Unit of the D. C. Public Schools and the California Test Bureau. 

e) The processing of the data was handled by CTB after It had been 
collected in the D. C. Schools. 

f) The new Instruments based on the findings of the pilot project 
followed the regular review and revision procedures of Instru- 
ments for grades 4-9 and will be ready for administration In the 
spring of 1973. 

2.5 Progress review and project liaison - 

This activity began with the discussion of the original proposal. 
Tttls continuous communication was and Is very time consuming, but 
ii3 of extreme Importance. Contacts by mall, telephone, and on-site 
consultations were required. 

Progress review and project liaison are extremely liq>ortant for 
the success of the entire project is dependent upon clear communic- 
ation, evaluation of work and appropriateness for the local school 
system. 

a) A coordinator must be, and was, designated. 

. b) Communication with CTB Project Director was a continual weekly, 
and usually more frequent, event during the course of the con- 
tractual agreements 

2.6 Develop Instructional prescription tables - 

The ultimate value of the prescriptive criterion-referenced 
test lies in its support for teacher and student for the Individual- 
ization of instruction and Improvement of the learning process. 
Such value rests squarely upon absolute congruence between the 
objectives and the instructional resources available for teacher 
and student. 

Instructional resources (such as texts and reference material, 
cited by title and page nvmiber), must be identified for each object- 
ive. 

a) This necessary, time-consuming task which relates curriculm 
materials to the test was undertaken by D. C. school personnel 
as a part of the project. 

b) As the coordinator I tried to get extra funding during the spring 
and summer 1971 for the task; none was made available, therefore. 
Dr. James I. Guines, Associate Superintendent, Division of In- 
structional Services, detailed 12 reading specialists from their 
regular assignments to the task for 2 weeks in September 1971. 

c) The task of getting all curricultmi materials keyed Is expensive 
in terms of staff time but is a mandatory part of the project 
so not only staff of the Division of Instruction but key staff 
in Departments of Research and Evaluation and staff in Depart- 
ment of Automated Information Services worked on this. 

d) Several meetings to train individuals in the process were held 
by the CTB coordinator and the D. C. schools project coordinator. 

e) Instructional format and procedures for keying prepared by CTB 
did not fit D. C. circumstances, therefore, adjustments were 

^ made and the revised formats utilized. 

ERJC „ 



f ) The keying of a limited number of curriculiim materials at each 
level for grades 4-9 in reading and mathematics was accomplish- 
ed by D, C, school personnel and CTB staff before the test ad- 
ministration in the. fall of 1972. 

2. 7.1 Develop report specifications - 

The report specifications were an outgrowth of item 2, 2, The 
data treatment, data to be reported to whom and the format were 
agreed upon by the D, C, and CTB coordinators and appropriate D, C, 
school staff. Format changed have been and will continue to be 
made as a result of the utilization of the instruments, 

2.7.2 Develop report programs - 

The data processing staff of the California Test Bureau design- 
ed and wrote the necessary computer programs for the above. 

2. 7.3 Develop reiport forms - 

The California Test Bureau's data processing and manufacturing 
departments vorked with the CTB coordinator in the development of 
the necessary reporting forms. 

2.7.4 Produce report forms - 

The manufacturing department of the Calif omJ.a Test Bureau con- 
tracted out the job and the report forms were printed. 

2.8 Management and editing materials - 

The California Test Bureau staff assumed all responsibility for 
correctness of all copy for syntax, context and format. 

2.9 Produce instruments - 

The production of the test booklets PMT - DC and PRT - 
DC was in the hands of the CTB Department of Manufacturing and Word 
Processing. The booklets for levels D through H in reading (grades 
4-9) and for levels E through H in mathematics (grades 5-9) were 
produced and ready for the fall 1972 testing program of the D. C. 
Public Schools. 

Levels A through C in reading and levels A through D in math- 
ematics are being completed and will be ready for test administra-* 
tion in the spring of 1973. 

2.10.1 Develop training program for D. C. staff: 

The CTB staff and staff of the D. C. Public Schools' Pupil 
Appraisal Section developed training materials on the use of the 
prescriptive test results for the individualization of instruction. 
In addition to the materials, the plan for the training sessions 
was also developed. 

2.10.2 Produce training materials - 

The actual production of the training materials was done by 
CTB and D. C. Public Schools and provided to classroom teachers and 
other staff. 

2.10.3 Conduct training sessions - 

Staff of the California Test Bureau and the staff of the Pupil 
Appraisal Section of the D. C. Public Schools conducted training on 



the administration of and use of the prescriptive test instruments 
with the testing chairmen of each elementary and junior high school 
of the D. C. Public School system. These testing chairmen with 
assistance from Pupil Appraisal decentralized staff then subsequent- 
ly conducted training sessions for the classroom teachers in his 
or her building prior to the spring 1972 grade 4 and 6 reading test- 
ing program and the fall 1972 city-wide testing at grade levels 4-9 
in reading and grade levels 5-9 in mathematics. 

2.11.1 Develop manuals - 
and 

2.11.2 Produce manuals - 

The California Test Bureau staff developed and produced manuals 
for teachers for each level of the tests. These manuals contain 
spacific directions to teachers for preparations for students to 
take the test, actual administration of the test and preparing test 
answer sheets and booklets for scoring. 

Manuals have been used for grades 4-9 in reading and grades 
5-9 in mathematics. Manuals for grades 1-3 in reading and mathemat- 
ics and in grade 4 in mathematics will be available when the PRT-DC 
and PMT-DC are used at those levels. 

These were the steps and actions in the criterion-referenced test 
development for reading and mathematics grades 1-9 in the Public Schools 
of the District of Columbia. A more complete description will gladly be 
given upon request. 



ic ^ ic ic 



What problems were encountered in the District of Columbia in the 
project to develop criterion-referenced tests? The major ones were: 

(A) There existed a severe lack of resources to do the job* 

(B) Criterion-referenced test development was not given the level 
of priv>rity in and by the school system commensurate with the 
task. 

(C) It was difficult for a long-term project of the magnitude of 
criterion-referenced test development to comply with a care- 
fully designed work flow chart when programs, priorities and 
objectives of the school system were going through a series 
of changes. In this kind of project many related tasks such 
as printing and data processing must be scheduled far in 
advance. 
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What were the outstanding positive elements in the developmental 
project? Among them were: 



(A) The planning and working together of a great many operating 
departments of the school system to accomplish the criterion- 
refereticed test development task. 

1) The exceptional cooperation and hard work of t\\e 
Supervising Directors of Reading and Mathematics 
and their staffs of reading and mathematics spe* 
cialists and of classroom teachers in the develop- 
mental and review procedures and in the preparation 
of prescriptive data. 

2) The keen interest and the willingness of the Direc- 
tors and staff members in the Department of Auto- 
mated Information Systems and the Pupil Appraisal 
Section to perform required tasks in the develop- 
mental process of the criterion-referenced tests 

in addition to their assigned work-load. 



(B) The opportunity provided through this project to various school 
personnel including teachers to gaiu first-hand experience in 
test development for system-wide use; subsequently, the use of 
these tests and test results will have a major impact on the 
instructional program at the classroom level as well as on the 
overall instructional plan. 

(C) The positive reactions of the subject field specialists and 
most teachers as evidenced by written comments and oral state- 
ments which indicated that: 



1) They felt that at last tests were available that 
were directly related to the instructional objec- 
tives of the local school system. 

2) They felt that the tests had more appeal to students 
than the ones previously used. 

(D) The results of the fall administration of the tests gave to 
teachers diagnostic prescriptive information on each indivi- 
dual child that took the test, thus providing valuable in- 
structional assistance. 

I would make very strongly the following recommendations to school 
systems planning to develop criterion- referenced tests: 

(A) The school administration and the Board of Education must be 
committed to support the concept of criterion-referenced test- 
ing. 

(B) Adequate resources must be allocated to support the entire 
project. 

(C) A full-time coordinator or director with no other assignment 
should be appointed before any planning steps are undertaken * 
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(D) The school system must have clearly defined Instructional 
objectives. These objectives must be stated In measurable 
terms. 

(E) There must be a clear understanding by the school system 
of the purposes of various types of measurement and the 
appropriate use of various measures. 

In closing I would like to say that the only complaints I have had 
so far from teachers and principals In the D. C. school system refer 
(1) to the fact that not all books and materials are keyed and (2) that 
the 4th grade PMI test developed by CTB for national use was not appro- 
priate for the District of Columbia. (The D. C. edition was not develop- 
ed In time for the fall 1972 test administration program). 

As the school system* s coordinator I might summarize my reaction to 
the criterion-referenced test development project in this way: I feel 
that in addition to the value of the criterion-referenced tests to the 
instructional staff and program, as cited above, the comments from pupils 
have inade the project completely worthwhile. They say, 'Ve like these 
tests much better than the ones we used to take. They are much more 
interesting." 
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December 1972 



The Development of 
Criterion-Referenced Tests 
in Reading 



Richard M. Petre 

Consultant in Reading 

Maryland State Department of Education 



The Maryland State Department of Education established reading as 
a priority approximately three years ago. One of the components of the 
plan was to inclement the following as outlined by the Department's 
executive staff: 

- To ensure that eech student and adult possess 
the basic skills necessary to become an effec- 
tive citizen. 

- By 1977, 85 percent of all Maryland students 
will be able to use reading a conmunicative 
skill as determined by appropriate criterion- 
referenced measures. 

A committee representing teacher-training institutions, local school 

systems, and the State Department worked with Dr. Roger Farr of Indiana 

University, to determine the best approach to accomplish these objectives. 

To define the problem specifically, the committee introduced the 
following four objectives: 

- By 1977, all students enrolled in the public schools, exclud- 
ing permanent care institutional cases, who have completed an 
elementary school program will be able to use independently 
the communication purposes of reading outlined in Table I, 
Parts I and II.C.l. 
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- By 1977, 85 percent of all students ccxnpleting a publl , 
school elementary program will be able to go beyond 
objective 2.2a and will be able to demonstrate a vwlety 
o£ performances of the process of reading outlined In 
Table I, Parts I and II.C.l. as determined by appropriate 
criterion-referenced measurements. 

- By I977| all students enrolled In the public schools, 
excluding permanent care Institutional cases, who are 
15 years old will be able to use independently the com- 
cxxmlcatlon process of reading outlined in Table I, Parts 
I and II.D.l. as determined by appropriate criterion- 
referenced measurements* 

- By 1977, 85 perr»nt of all students completing a secondary 
school program will be able to go beyond objective 2.3a 
and will be able to demonstrate a variety of performances 
of the communication processes of reading outlined in 
Table I, Parts I and II. D. 2. as determined by appropriate 
criterion-referenced measurements. 



The standard "85 percent of all students" was changed to a 100 per- 
cent level because some reading behaviors are needed by all people, vrith 
most students going beyond the basic level. The objectives also speci- 
fied due dates for mastery at the following levels: 12 year old, 15 
year old, or ready for high school graduation. 

In the firm belief that survival in society is the paramount reason 
for teaching reading in school, the committee selected functional reading 
as the priority emphasis. Five basic functional purposes for reading in- 
struction were agreed upon: (1) following directions; (2) locating re- 
ferences; (3) personal development; 94) gaining information; (5) using 
forms. Table I one the following pages represents the combined efforts 
of the committee and other groups - in Maryland who have agreed that the 
contents represents those survival reading tasks which are essential. 
This content source outline was used by SEE, Inc., of Bloomington, 
Indiana as the basis for writing the criterion-referenced tests to be 
used in Maryland. 
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We believe that Table I is one of the most comprehensive lists of 
survival tasks based on societal demands in the literature on reading. 
We also feel if Maryland students can perform these reading tasks they 
have the basic reading knowledge needed to function in society as well 
as those basic skills which will enable them to handle new reading 
situations throughout their lifetime. 

SEE, Inc. prepared for us three criterion-referenced tests: basic 
and advanced items for mastery by 12 year-olds, basic items for mastery 
by 15 year-olds, and advanced items for mastery before high school grad- 
uation. Sanq>les from these tests are given below: 



SAMPLE ITEMS ON THE CRITERION-REFERENCED TEST: 
Directions : 

Read each of the following questions carefully and mark the answer 
that describes you best. 

1. How much time do you spend reading for fun during vacations? 

a. None 

b. One to three hours a week 

c. Three to six hours a week 

d. More than six hours a week 



2. How do you feel about reading as a spare time activity? 

a. I enjoy it 

b. I can take it or leave it 

c. I'd rather do something else 

d. I don't like it at all 
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This newspaper index will help you answer questions 3 and 4. 



THE INSIDE STORY 



Fair, Colder 
Slightly Wanuer 
On Thursday 

(More Weather on Page A12) 

MICHIGAN CLINCHES 1ST PLACE in Big 10 Conference with 79-67 win 

over Northwestern. This and other sports on Page B4. 

INSURANCE IN WORKS for city employees. Read this C-T editorial 
today on Page AlO. 



Ann Iflnc'ers 


A9 


Movies 


B2 


Bridge Column 


B13 


Obituaries 


B14 


Business News 


B3 


Question Girl 


A5 


Classifieds 


BIO -11 


Sports 


B3-7 


Comics 


B12-13 


State News 


B9 


Crossword 


B13 


Statistics 


A12 


Editorials 


AlO-ll 


Television 


Bl2 


Family Living 


A8-9 







3. Which of the following would be found on page B12? 

a. Baseball scores 

b. Weather infortnation 

c. The time of a T. V. Program 

d. A crossword puzzle 

4. What section would you look at to find the cartoons? 

a. Comics 

b. Movies 

c. Sports 

d. Television 
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The grocery ad in the box is for items 5 and 6. 



FRE5H 


Fish Steaks 


2 lb. 


$1.19 




Deep Sea Favorite 
Frozen Turbot 3 lb. 


lb. 

or more 


69c 


S GROUIND \ 


Select 


lb. 


59c 


S BEEF 1 


Sliced Beef Liver 






\ za. 69i J 


Honeysuckle 12-14 lb. 


avg. 




Young Turkey 
Skinless 


lb. 

12 oz 


49c 

• 


SAVE PBR L&. 


Wieners 


pkg. 


59c 


Lessee AMOUNTS 


Large Bologna 


lb. 


59c 


75 4 P^f^ 


By the Piece 







Mark A for True and B for False . 
5. Turbot is a sea food. 



a. 
b. 



True 
False 



6, The cheapest meat per pound in the market is hamburger. 

a. True 

b. False 

Mary just bought the groceries. Use the grocery tape in the box to the 
left to answer each question below. 



. 00.39 


GR 


. 00.43 


GR 


. 00.72 


GR 


00.22 


GR 


. 00.16 


GR 


. 00.17 


GR 


. 00.17 


GR 


. 00.16 


GR 


. 00.33 


GR 


. 03.65 


MT 


. 05.88 


MT 


. 00,25 


TX 


. 12.53 


sb tl- 



00.46 GR 
00.36 GR 
00.02 TX 
13.38 tl- 

ca 

td 20.00 

06.62 CG— 

THANK YOU 
"4- 



7. 



8. 



How much money did 
Mary give the cashier? 



a. 
b. 



$12.53 
$13.38 



c. $20. OC 

d. $ 6.62 



How much change did 
Mary get' 



a. 
b. 
c. 
d. 



$12.53 
$13.38 
$20.00 
$ 6.62 



Use the oven operating instructions in the box to answer questions 9 and 
10. 



Mrs. Jones has just bought a new electric range. Here are the 
operating instructions for the oven. 

1« The reading on the oven thermostat dial shows BAKE 
area from 150^ to 500 and BROIL area from 375° to 
"Broil." 

2. Baking 

Turn dial to desired tetn>erature. If the dial is 
set above 300*^ both broil and bake elements stay on 
until desired tenqperature is reached when the broil 
element goes off. You will know when the desired 
temperature is reached as the indicator light will 
go off. 



Mark the letter on your answer sheet that is the best answer to each 
question. 

9. What is the tenqjerature range for BAKING? 

a. 100° to 200° 

b. 150° to 300° 

c. 300° to 375° 

d. 150° to 500° 

10. How does this electric range preheat the oven quickly? 



a. Both broil and bake elements stay on 

b. The broil element stays on 

c. Both broil and bake elements stay off 

d. The bake element stays on 



ERLC 



28 



In the box on the left is a list of companies who are looking for 
employees. To answer questions 11 - 14 mark the letter on your 
answer sheet which corresponds to the job for which that person 
is best qualified. 



OPENING for experienced lino- 
type operator and a composi- 
tion floor or lock-up man. 
All Inquiries kept confidential 
Contact Mr. Bobby Hall, Mid- 
land Press, Inc., Spencer, 
Maryland. 



EXPERIENCED COOK needed. Lunch 
and dinner hours. Good pay, 
good working conditions. Ref- 
erences required. Apply in 
person. MIKE'S CAFE, 217 N. 
Walnut. 



IN DESPERATE need of lead 
guitar player for rock group. 
Please phone 336-5166 or 339 
8317. 



RN or LPN full tjjne or part 
time 7-3:30 and 3-11:30 shift 
available. Pay commensurate 
with experience. 339-1657, 
after 5 p.m. 336-5570 



11. Jane Mac Donald 

Registered Nurse 

12. Mr. J. Fish 

Experienced 

13. Bobbie Haven 

Guitarist 

14. D. Y. Nelson 

Printer 



EVENING COOK WANTED. Hours 
12 noon to 8:30 p.m. Will 
train. Good starting wage and 
benefits. Apply in person 
between 2 p.m. and 4 p.m. 
ABC CAFETERIA, College Mall. 
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Our £lr8t statewide 8axq>llng of the test Items was conducted In May^ 
1972. This was the second trial for the test Items. The following results 
were couple ted: 

Basic Test ;' Includes those Items we want 100% of the students to 

achieve. (Successful performance of the Items was 80% 
or above) 

No. of 7. of Students 

Age Students Successful 

12 year old 470 65 

15 year old 416 84 

High school 301 85 

graduate 

Advanced test: Includes those Items we want most students to also 

achieve. (Successful performance of the Items was 

807o or above) « ^ <v ^ j - 

^ No. of 7. of Students 

Age Students Saccessful 

12 year old 512 45 

15 year old 416 65 

High school 209 72 

As you know, we In Maryland are exploring a different approach to 
reading. We have chosen to eiq>haslze survival reading. Accountability 
for results Is based on the following: 

(1) Declaring specific goals and behaviors for three age groups. 

(2) Testing observable performance on Items needed for survival 
reading Instead of those skills usually measured on standardized 
tests. 

(3) Planning Instructional decisions based on the results of testing 
described above. 

Currently, we are planning to test In each local school system. A 
second form of the criterion-referenced measurements will be constructed. 
Guidelines will be written to help local school systems Implement function- 
pi reading as a part of their already ongoing reading programs. 

Our State priority Is based on the belief that a reader Is one who 
not only can read but does read. logically » then, the place to start Is 
with functional reading needs. Thus, our effort and this working report. 
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The Development of a 
Criterion-Referenced Test 
of Mathematics 



Fanny Freytes 

Director of Program Evaluation in Mathematics 
Department of EducatioHi Puerto Rico 



7UTR0D0CTIQN 

During the last few years the Puerto Rico State Department of Educa- 
tion has made great efforts to provide the educational system of Puerto 
Rico with a series of adequate standardized tests which are essential for 
assessiTient and evaluation purposes. These efforts have included the de- 
velopment of standardized achievement tests for all levels in almost all 
of the curriculum areas. However^ in Puerto Rico as well as the United 
States with the advent of increased federal aid to education and the sub- 
sequent emphasis in the areas of educational planning, individualized in- 
struction and accountability there developed an awareness and a concern 
for the need of other types of instruments, namely criterion-referenced 
tests. Criterion-referenced measures have been considered particularly 
desirable in areas i^ere diagnostic information is needed, such as place- 
ment of individuals in programs of instruction, formative evaluation of 
educational programs, and in evaluative assessment of individual or group 
achievement . 

A first attempt toward the development of a criterion-referenced 
test has been undertaken as a joint project of the Division of Evaluation 
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and the Mathematics Program In the Department of Education. The project 
l8 concurrently developed with an accountability project In the area of 
mathematics at the seventh grade. As expected In an accountability pro- 
ject. Information regarding specific skills and knowledge development Is 
required In addition to the normative base for cociparlsons and interpre- 
tatlona. 

Accomplishments 

As an Initial step in the development of this Instrument, a careful 
review of the literature on the subject of criterion-referenced testing 
was done. The term ••criterion-referenced" appears to have been Introduced 
by Robert Glaser (1963) in a paper In which he distinguishes ••crlterlon- 
referenced^^ from ••norm-referenced^^ testing. In the latter, an individual's 
test performance Is Interpreted with respect to the performance of other 
Individuals who belong to some specified population. In contrast, the 
Interpretation of an Individual's performance on a criterion- referenced 
test l6 a behavioral statement that Is made without reference to the per- 
formance of other Individuals. 

Although considerable amount of attention has been given to the sub- 
ject of criterion-referenced measures, there are very few guides available 
to the constructor of this type of Instrument and In some cases the pre- 
vailing Ideas are of a contradictory nature.. In 1970, Dr. Stephen Klein 
from the UCIA Center, in his analysis of the relative efficiency of tests 
as vehicles for providing information for decisions about students and the 
educational programs they receive, suggested a four-step procedure which 
Intends to combine the better conq)onents of the norm and criterion refer- 
enced test approaches. The essential characteristic of this approach Is 
that It Includes the concepts of Item difficulty and normative score re- 
porting In the development and Interpretation of criterion based measures. 
This approach includes the following steps: 1) Specification of objec* 
tlves, 2) Developing test items for each objective, 3) Developing test 
Items to measure related objectives, 4) Providing score and score Inter- 
pretation for each objective. Each of these steps will be taken up In 
turn. 

The purpose of theAccountability Project which the mathematics test 
is intended to serve, as stated in the ••Tentative Draft of Accountability 
Model for Mathematics Achievement^^ (See Appendix A) require information 
O on the progress attained by Individual pupils, by classroom, by school. 
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by level, by zone, by district, by region, by region-levels and by regions- 
zones as compared with the criteria for normal achievement. It was with 
these purposes in mind that it was decided to follow Dr. Klein* s suggested 
approach in the development of this instrxxraent. 

Clarification of Objectives 

The goals for the teaching of mathematics in the elementary grades 
of the school system of Puerto Rico are clearly specified by the Depart- 
ment of Education and apply to all children at the elementary level. The 
mathematics test to be developed will be based upon these objectives and 
will attempt to measure to the greatest degree possible the specific 
skills, concepts, abilities and knowledges expected to be acquired by the 
end of the sixth grade. 

Tlie test will be divided into sub-parts according to the grouping 
of objectives by content are^s. These are: Basic Concepts of Set Theory, 
Numeration Systems, Operations, Geometry, Measurement and Graphs. 

Development of Test Items for Each Objective 

Approximately 50 items have already been developed by program speci- 
alists who have had previous training in item writing and test construction 
techniques and were further reviewed by personnel from the Division of 
Evaluation. 

One hundred or more items are being developed at present. In the 
development of these items special care is being taken to have a good re- 
presentative sample of the total population of items that might be used 
to measure the objectives considering both the range of formats and the 
range of item difficulty. 

Development of Test Items to Measure Related Objectives 

Learning in mathematics goes through sequential stages: the under- 
standing of one concept, the acquisition of one skill, is basic to another. 
If the foundation is poorly made, the structure as a whole will be weak 
and inadequate. Thus, it is very in^ortant to assess performance both on 
objectives that are either easier or more difficult to master rather than 
just the ones of major interest. This point is being carefully consider- 
ed in the development of items previously described. As indicated by Dr. 
Klein in his paper on this subject, the reasons for meastiring these kinds 
of related objectives are that they (a) provide information about the 



33 



unanticipated ovitcomes of educational programs (b) Indicate how close a 
program (or student) came to meeting or surpassing the objectives (c) 
shew the level at which subsequent educational treatments should be pitch- 
ed. 

Providing a Score and Score Interpretation for Each Objective 

As the Interest of this project Is specifically directed toward 
attainment of specific objectives, the results on the test will be analyzed 
In terms of these objectives. The Information provided by the test should 
reflect both criterion and norm- referenced performance on the Items design- 
ed to measure the objectives. As previously Indicated the mathematics 
test being developed will encoiq>a88 the six global objectives listed for 
the sixth grade. The idea is to have subscores for each part of the test 
designed to measure each objective. 

The statistical data relating to analysis of test results in terms 
of objectives of the mathematics program will be grouped into two major 
modalities. First, of all statistics on achievement in accordance with 
total nt]id>er of items assigned by parts and subparts will be made» Under 
the criterion of score norms, statistics will be con^lled (1) on the 
median, first and third quartile scores by subparts (2) on the percentage 
of items correctly answered in each sub-part (3) on an ltem«-by-ltem tab- 
ulation of percentage of errors, and (4) percentage level of achievement 
in accordance with score norms. 

Statistical data will also be presented tending to indicate the 
degree of attainment of objectives in terms of "expected" achievement. 
The criterion of expected level of achievement will be determined by 
program specialists at the State Department level and by experienced 
teachers at the local level. They will make their judgments on the basis 
of attainment they expect students to make at the end of the school year 
on each item of the test: 100 percent, 75 percent, 50 percent or less 
than 50 percent. The statistical compilations will include (1) the per- 
cent level of expected achievement item by item as determined by teachers 
and program specialists (2) a cooq>arison between median obtained scores 
by sub-parts with the criterion of expected median scores (3) a con^arison 
between the expected and attained percentage of level of achievement • 
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We should then be able to determine with some degree of accuracy 
how closely si::th grade children are achieving goals in mathematics in 
terms of norm scores and in terms of expected scores and will be able 
to use this information as the data base for the assessment ox progress 
attained by students in the seventh grade , which is the grade to be con- 
sidered in the Accountability Project previously mentioned* 
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